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Abstract.  We  derive  a  theory  of  data  fusion  based  on  an  additive  approach  to  Bayesian  evidence 
combinadon  and  accrual.  Although  the  additive  method  can  be  stated  in  terms  of  simple  formulae 
of  probability,  it  is  surprisingly  rich.  It  is  robust  against  errors  in  data,  and  analysis  and  numerical 
simulations  indicate  that  estimated  probabilities  of  hypotheses  converge  to  the  expected  value  of  a 
multiplicative  Bayesian  update  as  evidence  that  is  mostly  (but  not  necessarily  entirely)  correct  is 
accrued.  We  summarize  the  method  and  principal  results  in  the  first  part  of  the  paper.  The  method 
relies  on  a  representation  theorem  for  expected  values  of  uncertain  probabilities  that  is  an  exten¬ 
sion  of  a  theorem  of  deFinetti’s  (deFinettii,  1937).  The  extension  states  that  the  expected  value  of 
a  function  of  uncertain  probabilities  can  be  represented  as  a  weighted  sum  of  exchangeable  ran¬ 
dom  variables.  We  use  the  extended  theorem  to  show  that  the  additive  method  approximates  the 
expected  value  of  the  ordinary  Bayesian  posterior,  and  they  are  equal  in  the  limit.  In  the  second 
part  of  the  paper,  we  sketch  proofs  of  our  theorems,  derive  the  additive  rule  and  contrast  the  addi¬ 
tive  approach  with  others,  especially  multiplicative  Bayesian  updating  on  one  hand  and  various 
consensus-based  rules  on  the  other.  We  show  that  the  additive  approach  is  much  less  sensitive  to 
anomalous  data  than  is  Bayesian  updating.  The  additive  method,  while  similar  in  spirit  to  consen¬ 
sus  approaches,  is  not  ad  hoc. 
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1.0  Summary. 

Many  automated  systems  combine,  or  accrue,  evidence  from  diverse  sources  to  develop  support 
for  hypotheses  about  a  state  of  nature.  In  image  analysis,  for  instance,  evidence  may  come  from 
imaging  sensors,  human  intelligence,  spatial  databases  and  signal  analysis  to  name  a  few  sources; 
hypotheses  can  correspond  to  the  presence  or  absence  of  objects  of  interest  at  particular  locations, 
and  perhaps  their  disposition.  Object  recognition  systems  typically  combine  the  output  of  multiple 
feature  detectors  to  support  hypotheses  about  the  existence  and  identification  of  a  target  in  a  single 
image.  Geological  survey  systems,  oil  exploration  systems  for  example,  sometimes  pool  the  opin¬ 
ions  of  different  experts  to  estimate  the  probability  that  a  geological  feature  will  be  found  in  a 
given  region. 

In  general,  accrual  methods  should  be  robust  against  occasional  errors  in  evidence  since  evidence 
sources  can  be  unreliable.  In  particular,  they  must  be  robust  against  errors  that  cannot  be  predicted 
from  properties  of  an  ensemble,  but  are  instead  peculiar  to  a  given  realization.  Consensus 
approaches  address  this  issue  directly  by  forming  a  consensus  among  evidence  sources.  The 
notion  is  that  while  one,  or  even  a  few,  sources  of  evidence  may  be  in  error,  the  majority  will  not 
be;  thus,  the  effect  of  outliers  can  be  minimized.  We  describe  a  method  for  accruing  evidence  to 
hypotheses  that  is  based  on  establishing  a  probabilistic  consensus  among  evidence  sources.  Our 
consensus  approach  is  based  on  a  Bayesian  theory  of  additive  accrual  that  is  robust  against  both 
missing  data  and  bad  evidence. 

Probabilistic  evidence  accrual  involves  inducing  the  probability  of  a  hypothesis  as  evidence  is 
accumulated  about  a  particular  realization  of  a  random  process.  Inference  is  relative  to  the  state  of 
information  or  knowledge  of  the  assesor  (de  Finetti,  1937),  and  should  adapt  to  available  data  via 
an  inductive  process  that  is  not  restricted  to  properties  of  an  ensemble,  but  is  also  sensitive  to  the 
unique  characteristics  of  a  realization.  To  support  stable  decision  making,  it  is  critical  to  make  the 
most  robust  Bayesian  decision  possible  about  a  particular  realization.  Inductive  probabilities  can 
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be  viewed  as  random  variables  whose  estimates  change  with  respect  to  the  dynamic  state  of  infor¬ 
mation  about  a  realization. 


Our  additive  rule  for  accruing  evidence  to  a  hypothesis  satisfies  these  criteria.  The  rule  states  that 
when  new  evidence,  E^,  is  accrued  to  a  hypothesis,  H,  that  is  already  supported  by  existing  evi¬ 
dence,  Eq,  the  updated  probability  is, 

P(tf|£0uEw)  =  ?^^_[/>(£0|//)+P(£Ar|//)(l-P(£0|«))]  (1.1) 

The  right  side  of  (1.1)  indicates  P(HIEquEn)  is  a  scaled  version  of  the  likelihood  of  the  old  evi¬ 
dence,  P(E01H),  supplemented  by  the  likelihood  of  the  new  evidence,  P(ENIH),  times  whatever 
evidence  remains  to  be  accrued,  I-PCEqIH).  In  (1.1),  P(H)  is  a  prior  probability  whose  value  is  not 
based  on  any  of  the  Ej.  Usually  Eq  is  itself  the  union  of  previous  experiments,  Eq  =  E(u...uEN_j. 
The  internal  structure  of  the  Ej  may  be  arbitrarily  complex.  However,  a  convenient  way  to  view 
the  output  of  Ej  is  as  0-1  random  variables  where  Ej  =  1  corresponds  to  the  statement  that  “The 
evidence  obtained  from  the  i*  source  confirms  H.”  We  note  that  the  rule,  while  additive,  can  lead 
to  decreasing  probability  for  H  if  P(ENIH)  is  small  with  respect  to  P(EquEn). 


To  obtain  (1.1)  we  apply  Bayes*  Rule  and  only  require  that  the  experiments  be  conditionally  inde¬ 
pendent,  i.e.  that  P(EjEjlH)  =  P(EjlH)P(EjlH)  if  i  *  j.  However,  to  fully  understand  (1.1)  it  is  useful 
to  note  that  the  experiments  are  exchangeable  (deFinetti,  1937)  since  exchangeability  is  implied 
by  independence.  A  collection  of  events  is  exchangeable  when  their  joint  distribution  depends 
only  on  the  number  of  events  and  not  on  either  their  order  or  the  specific  events.  In  the  case  of  an 
infinite  sequence  of  exchangeable  0-1  random  variables,  Ej,  E2, ...  de  Finetti’s  Representation 
Theorem  states  that  there  exists  a  unique  probability  distribution  function  O  on  [0,1]  such  that 


(1.2) 
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where  P"r  is  the  probability  of  obtaining  r  1  ’s  out  of  a  collection  of  experiments  Ej ,  i=l,...,n. 
Therefore,  an  exchangeable  sequence  of  random  variables  can  be  viewed  as  a  mixture  of 
independent  random  variables  each  with  constant  probability  p. 


To  motivate  the  additive  rule,  we  first  extend  de  Finetti’s  Theorem  to  expected  values  of 
functions,  f(p),  of  uncertain  probabilities,  p:  We  show  in  Section  2.4  that  for  p  e  [0,1],  continuous 
f(p)  and  distribution  function  <b(p), 

E\n  =  (;K"  <u> 

0  n~*°°r  =  0 

where  P"r  is  the  probability  in  (1.2).  A  special  case  of  the  extension  is  when  f(p)  =  p,  then  we 


have, 


(1.4) 


We  also  show  that  P(EquEnIH)  approaches  the  expected  value  of  the  joint  distribution  of  the 
uncertain  evidence  sources  Ej, ....  Ej,.  Here  uncertainty  means  that  the  Ej’s  are  random  variables, 
hence  so  is  P(EjlH).  More  specifically,  we  let  £"  =  { (Ej ...  E„)l  at  least  one  Ej  =  1 } ;  it  is  a  set  of 
random  variables  whose  elements  are  possible  values  of  evidence  combined  conjunctively.  Let  En 
be  any  element  of  Then 

lim  [E  [P  (H\  Ef1) ]  -  P  (H\  El  u  ...  u  EJ  ]  =0  (1.5) 


We  calculate  the  expected  value  in  2  stages.  First,  we  average  over  all  realizations  in  £”.  The 
probability  of  a  given  realization  is  P(Et  =  1,  E2  =  0,  ...IH)  =  P(EilH)(l  -  P^IH)) ... .  Second,  by 
considering  all  possible  combinations  and  increasing  n,  we  get  a  large  sample  of  the  uncertain 
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joint  probabilities.  The  method  implicitly  partitions  the  joint  probabilities  into  histogram  bins.  We 
quickly  obtain  a  good  approximation  of  the  distribution  function  of  the  uncertain  probabilities,  <D, 
through  this  implicit  histogramming. 


Before  contrasting  the  additive  rule  with  other  methods  of  evidence  combination,  we  discuss  an 
important  distinction  between  types  of  evidence.  Evidence  for  a  hypothesis  can  be  strong  or 
weak.  Strong  evidence  is  always  present  in  data  sets  that  satisfy  a  hypothesis,  while  weak  evi¬ 
dence  may  be  absent.  Evidence  used  in  object  recognition  systems  illustrates  this  distinction.  Sup¬ 
pose  the  problem  is  to  identify  automobiles  in  some  class  of  images:  While  it  is  true  that  almost 
all  autos  have  wheels,  it  is  not  certain  that  wheels  can  be  seen  in  every  image  of  every  auto. 
Wheels  may  be  hidden  or  unobservable  depending  on  the  angle  of  view.  In  many  practical  prob¬ 
lems,  weak  evidence  is  the  dominant  form  of  evidence.  In  the  first  place,  the  ability  to  observe 
nearly  all  evidence  is  contingent  on  the  realized  data  set,  as  the  wheel  example  indicates.  Our 
additive  rule  reflects  that  view  since  the  total  evidence  for  a  hypothesis  is  the  normalized  sum  of 
evidence  obtained  from  many  sources.  The  report  of  a  single  source,  or  even  a  few  sources,  is  not 
enough  to  completely  alter  P(HI  Uj  Ei)  when  it  contradicts  the  majority  of  sources. 


We  demonstrate  that  the  additive  approach  to  accrual  (1.1)  is  much  more  stable  with  regard  to 
step-wise  variations  in  evidence  than  is  a  multiplicative  approach  like  standard  Bayesian  updating 
(cf.  Duda  and  Hart,  1973).  Stability  in  evidence  accrual  systems  is  not  just  a  matter  of  long-term 
behavior.  In  many  systems,  decisions  must  be  based  on  interim  probabilities.  If  probabilities 
swing  widely  from  one  update  to  another,  such  systems  may  behave  erratically.  Robust  tech¬ 
niques  like  the  additive  rule  are  inherently  stable  and  appropriate  for  small  data  sets.  The  additive 
rule  is  also  inherently  parallel  and  opportunistic.  It  is  parallel  because  exchangeable  evidence 
sources  can  be  processed  in  any  order,  and  it  is  opportunistic  because  evidence  can  be  processed  if 
and  when  it  becomes  available. 
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Contrast  the  robustness  of  (1.1)  with  multiplicative  Bayesian  updating. 


P(H\E0En)  = 


P(EJH) 

pTe^Eq)  {H]Eq) 


(1.6) 


where  Eq  is  now  the  intersection  of  the  results  of  previous  experiments  and  we  have  used  condi¬ 
tional  independence  to  simplify  Bayes’  rule  (cf.  Duda  and  Hart,  1973).  Obviously  P(HIEoEN)  will 
be  small  if  P(ENIH)/P(ENIE0)  is  small.  For  instance,  P(HIEj ...  En)  =  0  if  any  P(EjlH)  =  0.  To  con¬ 
tinue  the  example  given  earlier,  if  Ej  is  an  experiment  designed  to  detect  wheels  in  an  image  but 
no  wheels  are  visible,  then  (1.6)  sends  P(HIEj ...  En)  to  zero  even  if  the  image  contains  an  auto. 
Note  that  this  can  happen  when  all  other  experiments,  e.g.  ones  that  look  for  doors,  tail  lights, 
bumpers,  etc.,  return  P(EjlH)  =  1.  This  seems  unreasonable  given  the  contingent  nature  of  data. 
Furthermore,  an  experiment  can  return  low  P(EjlH)  even  if  the  data  on  which  it  is  based  satisfies 
H;  all  that  is  required  is  a  bad  experiment,  for  instance  one  that  does  not  identify  wheels  well 
under  some  conditions. 


Two  related  methods  for  overcoming  the  brittleness  of  multiplicative  updating  are  often  proposed. 
One  approach  amounts  to  computing  the  values  of  all  possible  realizations  of  the  joint  probability 
P(HIEj=ej ...  Ej,=en)  where  ej  =  1  or  0  and  using  some  subset  of  those  probabilities  to  evaluate  the 
certainty  in  H.  Unfortunately,  such  an  approach  can  be  computationally  expensive  since  it’s 
inherent  complexity  is  on  the  order  of  2n.  The  additive  rule  (1.1),  on  the  other  hand  has  complex¬ 
ity  O(n),  and  furthermore  computes  the  expected  value  of  P(HIEj=ei  —  En=en)  as  we  have  noted. 

The  second  approach  supposes  that  observational  contingencies,  for  instance  the  probablity  of 
occlusion  in  image  analysis,  can  themselves  be  modeled.  Note  that  a  model  of  occlusion  would 
require  1)  completely  parametrizing  the  process  of  occlusion  including  the  size  and  shape  of  the 
occluding  object,  ’ts  position  relative  to  the  target,  the  size  and  shape  of  the  target,  the  imaging 
geometry  and  so  on,  and  2)  determining  the  values  of  the  parameters  for  a  given  realization.  In  the 
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unlikely  event  that  a  complete  parameterization  of  occlusion  could  be  obtained,  many  models 
would  have  to  be  stored  and  computed.  In  most  cases,  determining  the  values  of  the  parameters 
of  an  observational  model  requires  solving  a  problem  of  equivalent  complexity  to  the  original 
problem. 


Numerical  results  provide  striking  evidence  for  the  robustness  of  the  additive  update  rule  versus 
the  multiplicative  update  rule.  Figures  1.1  and  1.2  are  plots  of  the  results  of  choosing  at  random 
component  probabilities  and  then  inserting  them  into  formula  (1.1)  for  additive  updating  and  for¬ 
mula  (1.6)  for  multiplicative  updating.  The  component  probabilities  are  the  probabilities  P(Eq) 
and  P(En)  of  getting  evidence  Eq  or  EN,  the  joint  probability  P(EoEn)  of  getting  Eq  and  EN,  the 
joint  probabilities  P(EqH)  and  P(ENH)  of  getting  Eq  and  the  hypothesis  or  EN  and  the  hypothesis, 
and  finally  the  joint  probability  P(EqHnH)  of  getting  evidence  Eq,  En  and  the  hypothesis. 


The  component  probabilities  for  two  or  more  evidence  gathering  experiments  and  a  hypothesis 
were  chosen  recursively  at  random  except  that  they  were  required  to  satisfy  measure  theoretic 
constraints.  The  constraints  are  the  following: 


P(E0En)  £  Minimum  (P(Eq),P(En))  (1  7) 

P(EiH)<P(Ei),i=  0,N  (lg) 

P  ( E0EnH )  <  Minimum  ( P  ( EQH ) ,  P  ( ENH ) )  (1  9) 

The  form  of  the  multiplicative  and  additive  rules  are  equivalent  to  our  previous  forms  but  allow 
these  random  probabilities  to  be  input.  The  form  of  the  multiplicative  update  rule  that  we  used  is 

(1.10) 

while  the  form  of  the  additive  update  rule  is 
P  (H\  E0  u  £  )  =  - +  P_  ~  P_  (£oenh) 

N  ~Wo)+nEN)-PiE0Ejr  (1 
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Simulations  show  the  results  of  iterating  the  additive  rule  and  multiplicative  rules  for  3,  17,  and 
129  experiments;  that  is,  we  initialize  each  simulation  with  a  randomly  chosen  Eq  and  then  accrue 
the  results  of  2, 16  or  128  new  experiments  whose  values  are  also  chosen  at  random.  Values  were 
selected  from  a  uniform  distribution  on  [0,1]  with  restrictions  given  by  (1.7M1.9).  Figure  1.1  is  a 
plot  of  the  results  from  100,000  runs  of  the  additive  rule  and  Figure  1.2  is  a  plot  using  the  same 
number  of  runs  for  the  multiplicative  rule.  The  central  limit  theorem-type  convergence  with 
increasing  number  of  experiments,  which  is  expected  from  our  theorem  (1.5),  is  clearly  evident  in 
Figure  1.1  while  no  difference  is  discernible  in  Figure  1.2.  Also,  the  same  tendency  toward  low 
values  of  the  iterated  update  is  found  for  the  multiplicative  rule  while  the  additive  updates  con¬ 
verge  toward  what  one  would  expect  for  random  choices  of  hypothesis  and  evidence  probabilities. 
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The  conservatism  of  the  additive  rule  is  also  evident  when  we  compare  likelihood  ratios  based  on 
the  two  rules  (Figure  1.3).  There  we  contrast  the  relative  effect  of  new  evidence  on  likelihood 
ratios, 


_p(^H)P(E0\H)  (U2> 

P(,E^H)P(E0\H) 

derived  from  the  Multiplicative  Rule  and, 

XA_  U’(E0\H)+P(E^H)ll-P(E0\H))]  (U3) 

IP(E0\H)  +P(.Efl\H)  (1  -P(E0\H))] 

derived  from  the  Additive  Rule.  To  simplify  the  discussion,  we  have  dropped  the  ratio  P(H)/P(H) 
that  multiplies  both  AM  and  Aa. 


To  obtain  Figure  1.3,  we  suppose  that  both  rules  start  with  the  same  prior  likelihood, 
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(1.14) 


I 

) 

f 


i 


_P(E0\H) 

A°~  P(E0\H)  * 

We  also  fix  P(EqIH)  =  0.25  since  the  Additive  Rule  requires  those  values  explicitly,  we  take 
P(EjsjlH)  =  0.25  and  we  let  A q  =  1.0,  i.e.  the  prior  likelihood  is  indifferent  between  H  and  H.  Other 
values  of  PCEqIH),  P(En!H)  and  Aq  yield  qualitatively  similar  results,  a  fact  that  we  discuss  at 
length  in  Section  4.  Clearly  AA  is  restricted  to  a  narrower  range  than  AM.  Furthermore,  AM  drops 
to  values  less  than  1  very  quickly.  Large  (small)  values  of  PfEj^lH)  lead  to  AM  that  is  much  larger 
(smaller)  than  Aq,  while  Aa  stays  closer  to  Aq  throughout  the  range  of  P(ENIH).  This  is  consis¬ 
tent  with  our  earlier  observation  that  the  Additive  Rule  gives  much  less  weight  to  outliers  than 
does  the  Multiplicative  Rule. 
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T  T  T  1  1  A0  =  1.0 

O  0.2  0.4  0.0  0.8  1.0  w 

P C En  G  T  ven  H) 

P(Eo|H)  =  0.25 

Figure  1.3  -  Additive  vs.  Multiplicative  Rule:  Low  P(Eq|H) 

Although  the  additive  rule  is  appropriate  in  many  practical  problems,  the  additive  and  multiplica¬ 
tive  rules  can  be  combined  to  yield 
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P(//|£5(£1uE2))  = 


(1.15) 


_>(*5>  _ 


P(H\EluE2) 


The  combined  rule  looks  like  Bayes’  rule  with  P(HIEiuE2)  as  a  prior.  The  effect  of  strong  evi¬ 
dence,  Es,  is  to  scale  PCHIEjuE^  by  the  Bayesian  likelihood  of  Es,  P(ESIH)/P(ES).  We  give  an 
algorithm  for  (1.15)  below. 


Additive  approaches  to  updating  are  not  new.  Our  current  work  is  related  to  Jeffrey’s  rule  (Jeffrey, 
1965;  Diaconis  and  Zabell,  1982)  and  can  in  fact  be  used  to  derive  his  rule  when  the  Ej  form  a 
partition  of  the  entire  sample  space.  Our  rule  is  closely  related  to  the  updating  methods  discussed 
by  Winter,  Ryan  and  Hunt  (1986),  but  they  neglected  the  normalization,  which  is  critical  if 
POdlujEj)  is  to  decrease  as  well  as  increase  as  evidence  is  accrued.  The  activation  of  a  “neuron”  in 
an  artificial  neural  systems  is  usually  based  on  a  weighted  consensus  of  other  neurons,  and  equa¬ 
tion  (1.1)  can  be  rewritten  to  look  like  the  activity  of  such  a  neuron.  When  written  in  that  form, 
(1.1)  suggests  an  adaptive  method  for  learning  the  properties  of  the  transformations  PfEjIH).  Con¬ 
sensus  rules  (Berenstein,  Kanal  andLavine,  1986;  deGroot,  1974)  use  weighted  sums  of  probabil¬ 
ities  to  represent  support  for  hypotheses. 


The  remainder  of  the  paper  is  essentially  a  set  of  appendices  to  this  summary.  It  is  organized  as 
follows:  In  Section  2  we  discuss  assumptions  and  define  a  few  terms,  specifically  i)  conditional 
independence,  ii)  exchangeability,  iii)  weak  and  strong  evidence.  We  also  restate  theorems  (1.3) 
and  (1.5)  more  formally  and  sketch  their  proofs.  Proofs  are  given  in  full  in  (Stein  and  Winter,  in 
prep.).  In  Section  3  we  derive  (1.1)  in  4  different  ways  because  each  derivation  illustrates  a  differ¬ 
ent  aspect  of  the  rule.  The  first  derivation  depends  on  straightforward  applications  of  Bayes’  Rule 
and  conditional  independence.  We  use  the  second  derivation  to  show  that  (1.1)  decreases  when 
new  evidence  does  not  strongly  support  H.  The  third  derivation  is  the  basis  for  the  proof  of  our 
second  theorem  (1.5).  The  fourth  derivation  relates  (1.1)  to  the  expected  value  of  an  indicator 
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function. 


In  Section  4  we  compare  the  properties  of  the  additive  rule  and  multiplicative  rule  through  numer¬ 
ical  results  similar  to  those  of  Figures  1. 1-1.3.  The  distribution  of  simulated  updates  of  the  addi¬ 
tive  rule  tightens  as  the  number  of  experiments  increases  while  the  distribution  of  multiplicative 
updates  does  not  change.  We  also  discuss  the  effect  of  new  data  on  likelihoods,  Aa,  and  AM.  The 
additional  results  further  confirm  Figure  1.3:  AM  is  less  stable  than  Aa  in  the  sense  that  small  dif¬ 
ferences  in  new  data  can  result  in  much  larger  changes  in  AM  than  Aa.  Additionally,  we  note  that 
Aa  is  affected  by  the  magnitude  of  P(EqIH)  and  thus  preserves  some  information  about  the  abso¬ 
lute  goodness  of  the  hypothesis  while  AM  loses  such  information. 

In  Section  5  we  relate  the  additive  rule  to  other  additive  approaches,  specifically  consensus  rules, 
Jeffrey’s  rule  and  neural  networks.  We  indicate  the  additive  rule  is  identical  to  Jeffrey’s  rule  when 
the  evidence  sources,  {Ej},  constitute  a  partition  of  the  sample  space.  Section  6  outlines  a  few 
issues  for  future  research. 

2  Background. 

A  word  about  notation:  Where  it  is  not  ambiguous  we  use  X  to  indicate  that  X  =  1  and  X  to  indi¬ 
cate  X  =  0. 

We  are  interested  in  problem  domains  in  which  a  collection  of  diverse  algorithms,  or  experiments, 
can  report  evidence  about  a  hypothesis.  In  many  cases  only  a  subset  of  experiments  may  report, 
and  furthermore  experiments  may  report  in  any  order.  We  assume  that  experiments  are  basically 
good  in  the  sense  that  they  do  discriminate  between  H  and  H  in  the  absence  of  contingent  errors. 
Specifically,  this  means  P(EIH)  «  P(EIH)  and  P(EIH)  «  P(EIH).  Although  we  assume  experi¬ 
ments  are  good  discriminators,  any  individual  experiment  in  a  realized  sequence  of  experiments 
may  be  unreliable.  That  is,  the  output  of  an  experiment  may  not  conform  to  the  true  state  of  nature 
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because  of  a  variety  of  error  sources.  On  the  other  hand,  we  assume  that  most  experiments  agree 
when  they  are  applied  to  a  given  event. 

2.1  Conditional  Independence 

Our  updating  rule  is  based  on  the  rather  weak  assumption  that  experiments  are  conditionally  inde¬ 
pendent  of  each  other.  This  amounts  to  claiming  that  P(EjEjlH)  =  P(EilH)P(EjlH),  or  equivalently, 
that  P(EjlHEj)  =  P(EjlH).  In  most  cases  these  are  reasonable  claims  about  the  parameterization  of 
the  distribution,  P.  The  second,  for  instance,  says  that  knowing  H  suffices  to  define  P,  and  that 
additional  evidence,  Ej,  is  not  useful  in  parametrizing  P. 

To  get  some  intuition  about  conditional  independence,  consider  the  case  of  flipping  a  coin  and 
trying  to  predict  whether  the  i1*1  flip  will  be  a  head.  Suppose  H  is  the  statement  “The  coin  is  fair,” 
and  Ej  is  a  set  of  flips  performed  previously.  P(EjlHEj)  amounts  to  asking,  “What  is  the  probabil¬ 
ity  of  getting  a  head  (or  tail)  on  the  i*  flip  given  that  the  coin  is  fair  and  we  have  already  obtained 
the  sequence  of  heads  and  tails  contained  in  Ej?”  Clearly  the  evidence,  Ej,  adds  nothing  to  the  def¬ 
inition  of  this  probability.  To  determine  Ej.all  we  need  know  is  that  the  coin  is  fair,  i.e.  P(EjlHEj) 

=  P(EjlH)  =  1/2. 


2.2  Exchangeability 

A  sequence  of  random  variables  is  exchangeable  if  the  joint  probability  P  satisfies 
P(^i  =  evE2  =  e2 ,  ...,£n  =  en)  =  P(£n(t)  =  =e2> -->En(n )  =  O  (2-l) 

where  it  is  a  permutation  on  n  indices.  This  type  of  probability  measure  is  called  symmetric  and 
has  been  studied  by  deFinetti  (1937,  1964),  and  was  fully  treated  by  Hewitt  and  Savage  (1955). 
Another  way  to  describe  a  sequence  of  exchangeable  random  variables  is  to  say  that  the  order 
does  not  matter  to  the  limiting  joint  probability  distribution. 

An  important  characteristic  of  many  types  of  evidence  sources  is  that  the  order  of  receipt  of  evi- 
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dence  should  not  affect  the  conditional  probability  of  the  hypothesis  given  this  evidence.  This  is 
important  because  the  order  of  receipt  of  evidence  may  not  be  the  same  as  the  time  ordering  of  the 
evidence  and  for  certain  types  of  evidence  the  time  ordering  is  not  significant.  For  example,  sup¬ 
pose  we  are  trying  to  identify  an  automobile  and  we  receive  evidence  that  it  has  a  convertible  top 
and  then  we  receive  evidence  that  it  has  wire  wheels.  It  should  not  make  a  difference  in  what 
order  we  combine  the  evidence  to  the  conditional  probability  that  we  have  a  specific  kind  of  auto¬ 
mobile  given  the  evidence.  Our  updating  scheme  leads  us  to  consider  exchangeable  random  vari¬ 
ables. 

As  shown  by  de  Finetti  (1937, 1964)  and  Hewitt  and  Savage  (1955),  a  symmetric  measure  may  be 
represented  more  simply  as  a  mixture  of  independent  power  distributions.  The  mixture  is  created 
by  integrating  the  power  distributions  over  a  random  probability  distribution  (see  Dubins  and 
Freedman  (1967))  on  the  power  distributions.  That  is, 

F(£,  e  A(.)  =  irr  (At)dvHP)  (2.2) 

n  • 

for  all  i=l,...,n.  Note  that  n(P)  is  a  random  probability  measure  over  the  set  of  probability  mea¬ 
sures  on  the  sample  space  Q  of  the  random  variables.  This  result,  usually  called  de  Finetti ’s  Rep¬ 
resentation  Theorem  or  just  the  Representation  Theorem,  takes  a  simpler  form  in  the  case  of  0-1 
or  Bernoulli  random  variables.  That  is,  there  exists  a  unique  probability  measure  on  the  Borel  sets 
of  [0,1]  such  that 

/>(£,=«,)._ j  k=  J  p'(l-p)*-'»(dp)  (2-3) 

[0,1] 

where  ej  is  either  0  or  1  and  j  =  Xe  j. 

A  variant  of  the  Representation  Theorem  holds  even  if  the  sequence  is  finite.  Suppose  that  k  is 
much  smaller  than  n  and  Ei,...,Ek  is  the  beginning  of  a  long  exchangeable  sequence 
E1,...,Ek,Ek+1,...,En.  In  that  case,  (2.3)  is  approximately  true  with  an  error  that  is  essentially  on  the 
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order  of  k/n  as  shown  by  Diaconis  and  Freedman  (1980). 


2.3  Updating  With  Weak  Evidence 

An  important  distinction  of  the  additive  update  procedure  from  the  typical  update  using  the  multi¬ 
plicative  rule  is  that  we  consider  union  or  disjunction  of  evidence  and  the  typical  scheme  looks 
only  at  the  intersection  or  conjunction.  Notice  that  the  union  of  evidence  includes  the  intersection 
as  well  as  other  regions  of  the  sample  space  that  have  not  been  covered  by  previous  evidence.  Evi¬ 
dence  comes  in  one  of  two  forms,  and  the  form  is  a  guide  as  to  whether  the  update  should  be  done 
using  union  or  intersection  of  the  new  evidence  with  the  old.  These  two  forms  we  call  weak  and 
strong  evidence. 

Strong  evidence  is  a  probabilistic  statement  about  a  condition  that  must  or  must  not  be  satisfied  by 
every  realization  of  a  random  process.  For  example,  when  trying  to  recognize  an  automobile  in 
imagery,  it  is  useful  to  remember  that  they  are  practically  never  found  in  water.  Strong  evidence, 
such  as  the  fact  that  an  object  is  by  itself  in  the  middle  of  a  deep  lake,  should  allow  us  to  conclude 
that  it  is  not  an  auto.  Because  strong  evidence  refers  to  conditions  all  of  which  must  be  consid¬ 
ered,  the  conjunction  of  the  strong  evidence  is  appropriate  and  this  leads  to  the  normal  method  for 
multiplicative  Bayesian  updating. 

Weak  evidence  is  a  probabilistic  statement  about  a  condition  that  may  or  may  not  be  satisfied  In 
the  auto  example,  the  size  of  an  object  may  imply  that  it  is  a  car.  But  several  types  of  trucks  that 
could  be  in  the  scene  may  be  the  same  size  as  a  car.  Also,  the  possibility  of  occlusion  of  a  critical 
component  such  as  wheels  requires  that  we  not  draw  conclusions  from  the  absence  of  a  compo¬ 
nent.  Because  weak  evidence  refers  to  conditions  only  some  of  which  may  be  considered,  the  dis¬ 
junction  of  the  evidence  is  appropriate  and  this  leads  to  the  method  of  additive  updating  we 
discuss  in  this  paper. 
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Although  the  two  types  of  updating  may  be  combined  (see  Section  3.2)  we  believe  that  most  evi¬ 
dence  is  weak  evidence.  The  uncertainty  associated  with  information  gathering  algorithms  and 
processes  always  allows  for  the  possibility  that  critical  components  are  missed.  Also,  as  was 
learned  from  the  knowledge  representation  activity  that  went  on  in  AI  research  for  many  years,  it 
is  very  difficult  to  completely  and  uniquely  identify  objects  and  situations  by  a  reductionist  listing 
of  the  attributes  or  components  that  must  make  up  the  object  or  situation.  Thus,  a  lot  of  evtuence 
is  weak  because  the  object  or  situation  that  it  is  applied  to  is  not  uniquely  or  completely  specified 
by  components  about  which  information  can  be  gathered. 


Furthermore,  evidence  can  be  weak  simply  because  experiments  fail.  We  say  a  supporting  experi¬ 
ment  fails  if  P(EilH=T)  is  small.  Supporting  experiments  can  fail  for  at  least  2  reasons.  First,  a 
data  set  may  be  a  member  of  H  yet  it  may  not  contain  data  to  support  the  experiment.  This  is  the 
problem  of  missing  data.  A  very  common  example  is  the  effect  of  occlusion  in  image  analysis;  an 
experiment  designed  to  recognize  human  faces  may  fail  on  an  individual  face  if  the  subject  wears 
a  stocking  cap  pulled  low  on  his  head,  thus  obscuring  ears  and  eyebrows.  No  matter  how  good  an 
experiment  may  be,  it  must  fail  if  the  data  on  which  it  is  based  is  missing.  Second,  it  must  be 
admitted  that  experiments  can  fail  just  because  they  are  bad,  i.e.  an  experiment  may  not  correctly 
classify  a  data  set  even  when  the  data  to  support  the  experiment  is  available.  We  call  this  the  prob¬ 
lem  of  systematic  error  in  ar  supporting  experiment. 


2.4  Theorems 

As  noted,  we  just  state  our  theorems  here  and  sketch  their  proofs.  We  give  complete  proofs  in 
(Stein  and  Winter,  in  prep.) 

Theorem  1 :  Extended  Representation  Theorem.  For  pe  [0, 1],  any  continuous  function  f(p)  and  a 
distribution  function  d>(p), 


E\f] 


(2.4) 
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'  ,  v  -  >1. iZt/J/ifi&e ifit •.‘41 


i..  ft* . 


where  Prn  is  the  probability  of  obtaining  r  successes  in  n  trials  selected  from  a  population  of 
exchangeable  random  variables. 


The  theorem  states  that  the  expected  value  of  any  funcdon  of  an  induced  probability  can  be  repre¬ 
sented  in  terms  of  exchangeable  variables.  It  follows  from  a  few  simple  facts.  First,  obviously 
l 

E[f\  =  ff(p)(HHp)  •  (2.5) 

o 


Next  we  can  rewrite  f(p)  in  terms  of  its  Bernstein  Series, 

np)  =  i/(;>  tVc 


r  =  0 


SO 

E 


(2.6) 


(2.7) 


We  apply  uniform  convergence  to  exchange  the  limit  and  integral  in  (2.7)  and  then  use  deFinetti’s 
Theorem  to  get 


nr. i/<£>  -  nr.  • 

r -0  0  r  =  0 


(2.8) 


Corollary.  If  f(p)  =  p, 


(2.9) 


Theorem  2:  Expected  Multiplicative  Update.  Let  £n=  ((Ej ...  En)l  at  least  one  E|  =  1 };  it  is  a  set  of 
random  variables  whose  elements  are  possible  values  of  evidence  combined  multiplicatively.  Let 
En  be  any  element  of  £n.  Then 

lim  [E[P(H\E?)]  -P(H\E,kj  ...vE)]  =0  (2.10) 

We  denote  Ej  =  1  by  Ej  and  E,  =  0  by  Ej.  The  critical  term  in  the  additive  rule  is  P(Ej<j...uEnIH), 
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which  may  be  re-written 


/>(£1u...u£,J//)  =  X  Z  P^Eir'EhEi^'-'Ei\H) 

r  =  inei. 

where  Srt  is  the  set  of  all  permutations  that  contain  r  l’s. 

Since  at  least  one  Ej  =  1  in  every  term  on  the  right  of  (2.1 1),  we  have 

n  n  (  A  n  n 

X"''  X-"*  /  r*  r>  r?  I  /  f  \  V'  ^  A  I  V1  _  I  I  — 


(2.11) 


I  X  P(E....£-Elitt...E;J//)  =  X  (JZJJSpi  II  Pi  II  U~Pj>  <2-12> 

fs]tle5,  k  =  1  l  =  l  i=\,i*l  j=l,j*l,j*i 


Here  we  use  exchangeability  and  substitute  pj  for  P(Ej  =  ej  =  1IH)  and  1-pj  for  P(Ej  =  e,  =  OIH). 
We  continue  with 

n  f  \  n  n  n 

r  n-1  tt  ~  rr 


(2.13) 


n  ft  n  d-pp 

=  1  /=  1  »'=  1,«V/  j= 


n  Pi  n  <■  -pp 

r  =  1  /  =  1  i=l,i*l 


The  terms  p,  n  n  n  (l  -/v)  give  us  various  estimates  of  the  probability  of  obtaining  r 
successes  in  n  trials.  After  we  histogram  them  into  m  bins  we  compute  the  frequency  of  each  bin 
<j>^.  Then  we  write 

n  pi  ri  <>-*,.>  = 

«  /  •  4  »  »  .  I  *  I  i  .  I  i  .  ?  _ 1  “4  _  t 


r=l  1=1  i=l,i*l 


1  X=1 


where  p^  is  the  value  in  the  Xth  bin.  By  letting  m  — » °©  and  assuming  that  the  empirical  density,  <J>x. 
dgoes  to  the  actual  density  dd>,  we  have  (2.9),  and  so  are  done. 


3  Additive  Update  Rule 

First  we  state  the  updating  rule,  and  then  derive  it  in  4  different  ways.  The  second  subsection 
describes  how  the  update  rule  can  decrease  belief  with  new  evidence.  The  third  subsection  dis¬ 
cusses  a  rule  that  combines  the  additive  and  multiplicative  update  rules  for  use  in  applications 
with  both  strong  and  weak  evidence. 
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3.1  Statement  of  the  Rule  and  Derivations 

We  state  the  rule  in  a  form  that  satisfies  all  4  derivations.  In  particular  we  require  that  experiments 
be  conditionally  independent  and  that  they  be  exchangeable.  These  fairly  weak  assumptions  will 
be  met  by  most  probabilistic  accrual  systems.  However,  individual  derivations  may  actually  allow 
even  weaker  assumptions.  For  instance,  our  first  and  second  derivations  do  not  require  exchange¬ 
ability.  We  note  such  points  in  the  remarks  following  each  derivation. 

Additive  Rule  for  Weak  Evidence.  If  Eq  and  are  sets  of  experiments  that  are  independent  when 
conditioned  on  a  hypothesis,  H,  and  if  the  prior  probability  P(H)  *  0,  then  the  updated  probability 
of  H  given  that  Eq  has  been  supplemented  by  Enj  is 

P(H\E0VEN)  =  pJEJSej  [P(£<>|*>+'’(£n|«)(1-/>(bo|*>)I  O-D 

The  rule  states  that  the  updated  probability,  P(H1EquEj^),  depends  on  the  sum  of  P(EqIH)  with 
P(EnIH)(1-P(EqIH))  =  PCEjvjEqIH).  Although  this  sum  is  always  positive,  P(HIEquEn)  can 
decrease  through  the  influence  of  the  scaling  factor,  PCHJ^EquEn).  a  point  we  return  to  below. 

An  alternative  form  of  the  additive  rule 

p  (HI  E0  U  En)  =  IP  (£0|  H)  +  P  (£«£0|  H) ]  (3.2) 

makes  it  clear  that  the  value  of  new  evidence  depends  in  large  part  on  how  redundant  it  is  with 
existing  evidence.  The  more  E^  overlaps  Eq,  the  smaller  is  Ej^’s  contribution  to  (3.2).  Although 
we  do  not  require  even  the  assumption  of  conditional  independence  to  obtain  this  form,  its  com¬ 
putational  utility  is  limited.  It  will  almost  never  be  the  case  in  an  application  that  all  possible  com¬ 
binations  of  Ejsj  with  Eq  can  be  anticipated,  much  less  modeled.  However,  (3.2)  leads  to  a 
statement  about  experimental  design  that  is  probably  obvious,  but  we  repeat  anyway  because  we 
think  it  useful:  unless  redundancy  is  required  to  assure  reliability,  it  is  most  cost-effective  to  keep 
experiments  as  uncorrelated  as  possible. 
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Derivation  1.  We  can  also  obtain  (3.1)  by  simply  applying  Bayes’  Rule  and  the  definition  of  the 
probability  of  the  union  of  2  sets, 


P(H\E0kj  En) 


P(E0uE„| //)/>(//) 
P  {Eq  u  En) 


-  P  (E0  u  E„)  [P  (E°\ H)+P  (g"l  H)  ~P  iE°E W  H)  1  (3-3) 

We  obtain  (3.1)  by  applying  conditional  independence,  P(EoENIH)  =  P(EolH)P(ENIH),  and  sim¬ 
plifying. 


Derivation  2.  We  can  also  derive  the  rule  from  a  difference  quotient.  This  derivation  emphasizes 
the  dynamic  nature  of  some  accrual  systems,  and  is  useful  in  showing  that  the  additive  rule  can 
decrease.  First,  we  define  some  notation, 


En=  U£.  =En'1uEn, 
i  =  l 


which  leads  to  a  natural  expression  of  the  change  in  probability  of  evidence, 
AP(E)  =  P(En)  -  PfE"'1)  =  POEfjE11'1) 

Defining  AP(HIE)  to  conform  with  (2), 


A P  {H\  E) 
A  PjEj 


P(H\E ?)  ~P(H\En~l) 
P(E ?)  ~E(£',_1) 


pm 

P{En~l  uEn) 


P(E„£n_1 1//) 
P(EnEn-1) 


P{En~x\H) 

P(En~l) 


(3.4) 


(3.5) 


(3.6) 


Moving  terms  around  and  letting  Eq  =  E"'1,  En  =  En, 


P(H\E0uEn)  —  P  (H\  E0)  + 


P  (H) 

P  (Pq  ^  En) 


P{EnE0\H) 
P  iENE0) 


P(E0\H ) 
P(E0) 


A  P(E) 


(3.7) 
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=P<^hlP(E° w+'W0'  (3-8 

Clearly,  P(H1E)  can  decrease  when  new  evidence  is  added  since  AP(HIE)  <  0  when 

r  P(Eo |«)  irP<£ouE«)_,l  (3.9 

p(E*\H)<[T^p(E^H)\[  P  (E0)  J 

and  PCE^IH)  can  be  arbitrarily  small.  The  numerical  results  in  Section  4  further  illustrate  this 
point. 


Derivation  3.  Our  main  result  is  an  update  formula  that  successively  constructs  a  probability  mea¬ 
sure  over  the  hypothesis  space.  One  can  view  this  measure  as  the  limiting  probability  measure  for 
a  sequence  of  exchangeable  random  variables  or  their  corresponding  events  that  represent  the 
accruing  evidence.  We  consider  that  the  result  of  each  evidence  event  EjH  yields  a  conditionally 
independent  sample  P(EjH)  from  the  closed  interval  [0,1]  and  that  this  sample  represents  the  joint 
probability  of  having  the  evidence  and  the  hypothesis.  P^H)  =  P(H)  -P^H)  represents  the  joint 
probability  of  not  having  the  evidence  and  the  hypothesis. 


From  Equation  (2.1)  we  must  have  for  each  choice  of  k  events  out  of  a  total  of  n  events 

P  (EiH)P(EiH  )...P  (E.H)  =  P  (EK(ii)H)P(En{ii)H)  ...P  (E  n{it)H) 


(3.10) 


where  n  is  a  permutation  of  the  integers  1 , 2, ...  ,n  and  we  have  used  the  conditional  independence 
of  the  individual  experiment  events.  For  this  to  be  true  we  must  have  for  every  k  of  n  evidence 


events 

P{EiH)...P{EiH) 


£  P(En{l)H)...P(EnWH)P(En{k+l)H)...P(Enin)H) 


(3.11) 

where  it  should  be  noted  that  the  7C(ik)  are  permutations  on  n  letters  (e.g.  (71(3), 71(5), 7l(k)) 
=(7,k+2,n))  and  S’n  denotes  only  those  permutations  where  7t(l)<7t(2)<...<7l(k).  Now  we  con¬ 
sider  only  those  joint  events  where  we  have  at  least  one  evidence  event  and  the  hypothesis  to  be 
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consistent  with  the  fact  that  we  did  perform  experimentation.  The  expected  value  of  the  probabil¬ 
ity  for  all  these  joint  events  is 

X  (3.12) 

k=  1 

and  it  is  easy  to  show  that 


P(H(CfEk))  =  f  [nk)(P(EiH)P(EiH)...P(EiH)) 

*=  l  k=i v  '  * 


(3.13) 


It  is  also  easy  to  show  that  upon  rewriting  the  right  hand  side  of  (3.13)  we  can  obtain  our  update 
formula.  For  the  sake  of  brevity  we  will  demonstrate  this  for  only  two  evidence  sources,  but  the 
proof  for  any  number  of  evidence  sources  is  easily  derived  from  an  exact  but  lengthy  computation 
or  by  mathematical  induction.  For  convenience  let  P(H(EjuE2))=Ph  and  P(EiH)=pi.  Combining 
(3.1.3)  and  (3.13)  for  the  case  of  two  evidence  sources  we  get 


Ph  =  (i)f^(M1-P2)+/72(1-Pl))  +(2)  -^(^2)  (3-14) 

and  after  multiplying  through  and  collecting  terms  we  get 

Ph  =Pi+P2(l~P0  (3.15) 

or  using  the  value  of  pj,,  Pi,  and  P2 

/’(//(£1u£2))  =  />(£,//)  +P(E2H)  (1  -P{EXH))  (3.16) 

which  is  the  same  as  our  update  formula  if  the  probabilities  are  rewritten  in  terms  of  conditionals 

Derivation  4.  As  a  final  alternative  derivation  we  obtain  our  update  formula  as  an  expected  value 
of  a  certain  ratio  of  random  variables.  Previously  we  found  that  we  could  write  our  update  for¬ 
mula  in  the  form 


?2 


P(UE^H) 

_ - P(H )  = 


P(H]  KjEk)  = 

k=l  P(UEk) 

Jk  ==  1 


£«(n)|") 

^ _ _ _ _ _ P(H) 

^Jt(l)  •’•En(k)  En(k+l)  ) 

n,k 


(3.17) 


where  the  intersection  events  form  a  partition  of  the  sample  space  excluding  the  all  evidence  com¬ 
plement  sets  (i.e.  Ert(1)...Ert(k)EJC(k+1)...EJc(n))  and  where  n  and  k  are  as  defined  in  Derivation  3. 
Now  if  we  let  3n  be  the  O-algebra  generated  by  the  partition  and  the  hypothesis  event  H  we  can 

rewrite  this  as 


EKV‘+U'"En  w|«> 

M _ _ _ _ _ P(H)  =  £ 

’"En{k)  £*(*+  1)  •••£«(»)  ) 

it,  k 


rx> 

\ 

3 

n 

\  n,k 

) 

(3.18) 


where  OEH  and  nE  are  shorthand  for  the  intersection  sets  of  the  partition  and  I^eh  and  ]nE  3X6 
their  indicator  functions.  Thus  we  have  rewritten  the  result  of  our  update  formula  as  an  expecta¬ 
tion,  which  by  elementary  martingale  theory  implies  that  the  result  of  our  additive  update  process 
is  a  martingale.  This  allows  a  lot  of  powerful  theoretical  results  to  be  applied  to  the  investigation 
of  the  properties  of  our  method.  In  Section  5.2  we  further  rewrite  our  update  rule  to  relate  it  to  a 
modem  branch  of  martingale  theory  about  multiplicative  random  processes. 


3.2  Combined  Rule 

Although  most  problem  domains  are  based  on  weak  evidence,  some  contain  strong  evidence. 
Thus  we  note  a  simple  method  for  combining  strong  evidence,  Es,  with  weak  and  vice  versa. 


Algorithm  for  Combining  Weak  and  Strong  Evidence.  If  Es  is  independent  of  Ei  and  E2,  and  E! 
and  E2  are  conditionally  independent  events,  then  the  probability  of  H  given  that  Es  must  be 
observed  and  that  we  can  observe  either  Ej  or  E2  (or  both)  is 


P(H\Es{EluE2)) 


pP(E5|//)-| 

.  PWsT. 


P(H\E1kjE2) 
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(3.19) 


/>(£*)//)- 
P(ES)  _ 


r  PjH) 

[p(ExuE2) 


[P(Ex\H)+P(E^H)(l-P(Ex 


i.e.,  the  new  rule  is  the  product  of  the  Bayesian 


likelihood  ratio  of  Es  with  the  Additive  Rule. 


When  additional  strong  evidence  is  obtained,  it  is  fnsed  into  P(H|.)  by  applying  ordinary  multipli¬ 
cative  Bayesian  updating.  New  weak  evidence  is  accrued  to  P(HIE,  u  E2)  by  applying  our  Weak 
Rule.  The  basic  algorithm  is  depicted  in  Figure  3. 1 .  Strong  and  weak  evidence  streams  ate  main¬ 
tained  separately  and  am  updated  by  respectively  the  multiplicative  or  addidve  rules.  When  new 
evidence  is  obtained,  it  is  first  accrued  to  the  appropriate  stream,  and  then  the  streams  am  com- 

bined  according  to  (3.19). 


4  Comparison  With  Multiplicative  Bayesian  Updating 

In  this  section  we  compare  the  additive  rule  to  several  alternative  methods  of  evidence  accrual. 
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We  begin  by  contrasting  the  additive  rule  with  multiplicative  Bayesian  updating.  The  additive  rule 
is  Bayesian,  but  of  course  it  is  additive,  not  multiplicative.  Hence  it  is  not  as  sensitive  to  anoma¬ 
lous  evidence  as  is  ordinary  multiplicative  Bayesian  updating,  a  fact  that  we  discuss  through  ana¬ 
lytical  and  numerical  results.  Furthermore,  numerical  results  indicate  that  the  additive  rule 
converges  to  the  actual  value  of  P(H)  as  evidence  is  accrued. 

The  Multiplicative  Rule  is  based  on  the  notion  that  all  evidence  is  strong,  and  therefore,  that  every 
experiment  can  find  the  data  it  requires  in  a  given  data  set.  Multiplicative  evidence  accumulation 
consists  of  progressively  restricting  attention  to  just  those  individuals  that  are  strongly  supported 
by  all  experiments.  It  is  implicitly  assumed  that  individuals  that  satisfy  the  hypothesis  will  have 
strong  support  from  all  experiments.  We  have  already  argued  that  this  is  unrealistic.  Even  data 
sets  drawn  from  objects  of  interest  may  not  contain  data  required  to  support  some  experiments. 
Furthermore,  the  Multiplicative  Rule  assumes  that  every  experiment,  E,,  is  good  in  the  sense  that 
if  the  data  required  by  Ej  is  in  the  data  set,  then  P(EjlH) »  0  and  P(EjlH) «  1. 


4.1  Analysis. 

The  additive  update  rule  has  been  previously  written  with  the  union  of  evidence  expanded  using 
the  inclusion  exclusion  principle,  that  is 


r(C>El)  =  £/><£,)  - (£,£,)  +...+  (-l)n*'P(E0Ev..En)  (4.1) 

Jfe  =  0  i  i<j  J 

Alternatively,  we  could  have  expanded  the  probability  of  the  union  of  evidence  as  a  partition 
P(CjEk)  =  j^PtE^.E,)  (4.2) 

*=°  k= 0 

Using  (4.2)  we  can  rewrite  the  additive  update  rule  as 


P(H\  (jEk) 


k  =  0 


P  (H) 
P((jEk) 

k  =  0 


U=0 


(4.3) 
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For  comparison  we  recall  the  multiplicative  rule  in  a  similar  form 

»  _  P(H)  [*  FT  p  rr  i  ml  (4-4) 

P(H]nEk)=  - n - 

k=l  p(r\Ek) L*=0  J 

k=  1 

These  two  equations  clearly  display  some  primary  differences  between  the  two  updating  schemes. 
First,  the  limiting  behavior  of  the  additive  rule  is  governed  by  a  sum,  which  is  relatively  stable 
with  respect  to  variations  in  individual  terms,  versus  the  multiplicative  rule  which  is  governed  by 
a  product  that  is  highly  variable  due  to  variations  in  individual  terms.  In  fact,  a  worst  case  for  the 
multiplicative  rule  is  where  one  of  the  P(E,IH)  terms  is  equal  to  zero  forcing  all  subsequent 
updates  to  be  equal  to  zero.  As  we  have  pointed  out  earlier,  this  conditional  probability  could  be 
zero  or  near  zero  for  a  variety  of  masons  and  is  in  fact  the  reason  a  mom  robust  update  formula  is 
needed.  Another  obvious  difference  is  in  the  normalizing  terms.  The  additive  rule  has  a  normaliz¬ 
ing  term  which  is  monotonically  increasing  and  approaching  at  most  the  value  1.  The  multiplica¬ 
tive  rule  has  a  normalizing  term  which  is  monotonically  decreasing  and  approaching  the  value  0. 
Thus  variations  in  the  selection  and  ordering  of  the  evidence  experiments  E,  may  create  large  flue- 
tuations  in  the  value  of  the  update. 


4.2  Probability  Update  Simulations. 

Section  4.1  compared  several  mathematical  properties  for  the  multiplicative  and  additive  rules.  In 
this  section  we  want  to  present  some  numerical  results  that  provide  striking  evidence  for  the 
robustness  of  the  additive  update  rule  versus  the  multiplicative  update  rule. 

Figure  4.1  is  a  plot  of  the  results  of  choosing  at  random  the  component  probabilities  and  then 
plugging  these  into  the  two  formulas  for  multiplicative  and  additive  updating. 
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The  component  probabilities  for  two  evidence  gathering  experiments  and  a  hypothesis  are  chosen 
at  random  with  appropriate  conditions  on  some  of  the  probabilities.  Specifically  these  component 
probabilities  are  the  probabilities  P(Ej)  and  P(E2)  of  getting  evidence  E!  or  E2,  the  joint  probabil¬ 
ity  P(E!E2)  of  getting  E!  and  E*  the  joint  probabilities  P^H)  and  P(E2H)  of  getting  E!  and  the 
hypothesis  or  E2  and  the  hypothesis,  and  finally  the  joint  probability  PCE^H)  of  getting  evi¬ 
dence  Ei,  E2  and  the  hypothesis.  The  conditions  are  the  following: 


P(£j£2)  £  Minimum  (P  (E^tP  (E2)) 
P(EXH)  <P{EX) 

P(E2H)  <P(E2) 


(4.5) 

(4.6) 

(4.7) 
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(4.8) 


P  ( ExE2H )  <  Minimum  (P  (EXH) ,  P  ( E2H ) ) 

Within  the  constraints  imposed  by  these  conditions  the  probabilities  are  chosen  at  random.  The 
tan,  0f  the  multiplicative  and  additive  rules  are  equivalent  toour  ptevious  fonns  bu,  allow  these 
random  probabilities  to  be  input.  The  multiplicative  update  rule  that  we  used  is 

P  (E^E2H)  (4.9) 

P(H\E{E2)  =  ~pJe^E2T 

and  the  additive  update  rule  that  we  used 

f>(E1//)+J’(£2H)-',(El£2«)  <4'10> 

P(H|£,  u£2)  =  p(E,)  +  F(£2)  -P(t'|t2) 

Figure  4.1  clearly  shows  the  stability  of  the  additive  update  process  versus  the  multiplicative 
update  process.  One  can  view  the  simulation  as  taking  a  prior  probability  distribution  over  the 
prior  probability  P(HIE,)  which  is  uniform  over  [0,1]  and  transfonuing  it  into  the  posterior  dism- 
bution  over  the  posterior  probabilities  PfHIE.Ej)  which  is  shown  in  Figure  4.1  for  both  of  the 
multiplicative  or  additive  update  roles.  The  mean  of  the  posterior  probabilities  is  .22  for  the  mul¬ 
tiplicative  case  and  .50  for  the  additive  case.  This  shows  that  the  multiplicative  update  rule  on 
average  computes  a  posterior  probability  that  is  about  one-half  the  value  of  the  posterior  probabil¬ 
ity  f„r  ute  additive  update  rule.  Also,  the  distribution  of  the  posterior  probabilities  and  the  distri- 


bution  of  prior  probabilities  are  closer  for  the  additive  rule  than  for  the  multiplicative  role.  Thus 
the  additive  update  process  is  much  mom  conservative  than  the  multiplicative  process. 


Another  simulation  shows  the  result  of  iterating  our  update  role  for  2,  .6,  and  128  experiments. 
Figure  4.2  is  a  plot  of  the  results  from  100,000  runs  for  our  additive  update  rule  and  Figure  4.3  is 
a  plot  using  the  same  number  of  runs  for  the  multiplicative  rule.  The  central  limit  theorem-type 
convergence  with  increasing  number  of  experiments,  which  is  expected  by  the  martingale  prop¬ 
erty,  is  clearly  evident  in  Figure  4.2  while  no  difference  is  discernible  in  Figure  4.3.  Also,  the 
same  tendency  toward  low  values  of  the  iterated  update  is  found  for  the  multiplicative  rule  while 
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the  additive  updates  converge  toward  what  one  would  expect  for  random  choices  of  hypothesis 
and  evidence  probabilities. 
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4 ,3  uhetibood  Simulations-  We  compare  the  relative  effect  of  new  evidence  on  likelihood  ratios, 

(4.11) 

u  _  P(E^H)P(E0\H) 

A  '  pTE^mPiE^H) 
derived  from  the  Multiplicative  Rule  and, 
a  [P (E0 1 H)  +P(E^H)  (1  —  F (g^l ^))1 

A  _  \pTeah)  +p{Ei!h)  u (£<?i^))^  _ 

derived  from  the  Additive  Rule.  To  simplify  the  discussion,  we  have  dropped  die  ratio  P(H)/P(H) 
that  multiplies  both  LM  and  LA. 


TO  indicate  the  effect  of  new  evidence  we  can  plot  L*  and  L*  against  P(EN,H>  and  P(EN,H)  (Fig- 
urei  4.4-4.9).  To  obtain  the  figures,  we  suppose  that  bod,  rules  star,  with  the  same  pnor  1  e 


hood, 

P(E0  \H) 

k°  =  pIeJ^) 


(4.13) 
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and  we  also  fix  PCEqIH)  since  .he  Addifive  Rule  requires  those  values  explicitly.  When  Lo  =  1.0, 
i.e.  when  the  prior  likelihood  is  indifferent  between  H  and  H,  and  when  P(E0IH)  =  0.25,  we  have 
and  4.5.  Other  values  of  PfEolH)  yield  qualitatively  similar  results.  Clearly  LA  is 


Figures  4.4 ; 


restricted  to  a  narrower  range 


than  Lm.  Furthermore,  LM  drops  to  values  less  than  1  very  quickly. 


This  is  somewhat  easier  to  see  if  we  also  fix  P(ENIH)  and  plot  LA  and  L«  against  P(ENIH)  (Fig¬ 
ures  4.6-4.9).  The  statistics  associated  with  the  figures  give  maximum  and  minimum  values  of 
L's,  the  slope  of  die  L  curves,  and  the  0  distance  of  the  L  curves  from  Lo-  The  figures  indicate 
that’  iarge  (small)  values  of  P(ENIH)  can  lead  to  LM  that  is  much  larger  (smaller)  than  lo,  while 
la  stays  closer  t0  Lo  throughout  the  range  of  P(ENIH).  This  is  consistent  with  our  earlier  observa¬ 
tion  that  the  Additive  Rule  gives  much  less  weight  to  outliers  titan  does  the  Multiplicative  Rule. 

Comparing  Figures  4.6  and  4.7  indicates  that  the  effect  of  new  positive  evidence,  P(ENIH),  on  LA 
is  reduced  if  relatively  strong  evidence  ( P(Eo'H)  =  0.25  vs.  P(EolH)  =  0.50  )  has  already  been 
accrued.  The  magnitude  of  previous  evidence  has  no  effect  on  LM  since  it  does  not  depend  on 
P(E0IH)  directly,  and  thus  cannot  distinguish  cases  where  prior  evidence  is  negligible  from  those 
in  which  quite  a  lo,  of  evidence  has  been  accumulated.  Figures  4.6  and  4.8  show  that  high  values 
of  new  negative  evidence  ( P(ENlH)  =  0.25  vs.  P(ENlH)  =  0.50 )  reduce  both  Aa  and  AM.  How¬ 
ever,  the  effect  on  AM  is  more  pronounced:  the  slope  of  the  AM  curve  is  reduced  by  half  while  the 
slope  of  the  Aa  is  basically  unchanged.  Maximum  and  minimum  values  of  Aa  and  AM  show  sim¬ 
ilar  effects.  Figure  4.9  shows  the  effect  of  the  prior  likelihood,  A0-  The  higher  Aq,  the  greater  is 
the  effect  of  new  evidence  on  AM.  On  the  other  hand,  Aa  is  restricted  to  a  narrower  range  that  is 

closer  to  Aq. 
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C  distance  to  Lq  0.64 


1.25 


I - t - 1 - 1  Ao-1.0  P(Eo|H)  =  0.25  P(EN|H)  -  0.25 

.4  o..  v 

t  Cl ven  H) 

Figure  4.6  -  Additive  vs.  Multiplicative  Rule:  Low  P(Eq|H) 
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Figure  4.7  --  Additive  vs.  Multiplicative  Rule:  High  P(Eq|H) 
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Figure  4.8  -  Additive  vs.  Multiplicative  Rule:  High  P(EN|H) 
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Figure  4.9  -  Additive  vs.  Multiplicative  Rule:  Multiple  A0 


5  Comparison  with  Other  Additive  Rules 

We  also  compare  the  additive  rule  to  consensus  rules  and  Jeffrey’s  rule.  The  additive  rule  is  a  kind 
of  consensus  rule  since  it  builds  up  P(HI  Uj  Ej)  as  a  weighted  average  of  evidence  sources,  but  it  is 
derived  from  simple  probability  arguments  and  is  not  ad  hoc.  Jeffrey’s  rule  follows  from  the  addi¬ 
tive  rule  when  the  are  a  complete  partition  of  the  event  space.  Activation  of  artificial  “neurons” 
is  usually  achieved  by  consensus,  and  we  can  write  the  additive  rule  so  that  it  looks  like  a  method 
for  activating  a  neuron.  When  we  do  that,  we  obtain  an  expression  for  the  weights  of  an  artificial 
neural  system  that  might  be  useful  in  defining  learning  dynamics. 

5.1  Consensus  Rules 

A  branch  of  applied  probability  is  concerned  with  combining  the  opinions  of  several  experts  or 
the  subjective  probability  assessments  of  several  experts.  Consensus  rules  are  one  general  method 
for  combining  these  opinions  or  probabilities.  These  have  been  explored  by  a  variety  of  research¬ 
ers.  A  modem  survey  of  the  necessary  properties  of  general  consensus  rules  and  some  additional 
mathematical  properties  of  linear  consensus  rules  is  given  in  Berenstein,  Kanal  and  Lavine 
(1986).  Another  good  reference  is  found  in  DeGroot  (1974).  An  early  reference  termed  the  group 
of  opinions  an  opinion  pool.  We  now  describe  two  types  of  opinion  pools,  linear  and  independent, 
and  mention  their  relationship  to  our  additive  update  rule. 

The  linear  opinion  pool  combines  the  group  of  subjective  probability  distributions  in  the  form 

n  n 

P(H\  U  Ek)  =  (//]£,•)  (5.D 

*  =  °  Jt  =  o 

where  the  weights  wj  are  positive  and  sum  to  1.  Due  to  the  ad  hoc  nature  of  this  formula,  there  is 
the  problem  of  determining  the  weights  in  a  probabilistically  consistent  manner.  Even  more 
important  is  the  fact  that  the  formula  does  not  allow  the  reinforcement  of  negative  evidence  as  the 
evidence  experiments  increase  because  the  sum  on  the  right-hand  side  of  the  formula  is  monoton- 
ically  increasing. 
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The  independent  opinion  pool  can  be  written 

P(H 1  \jEk)  =  af[P(H\Ek) 

k=0  *  =  0 

where  a  is  a  normalizing  constant  and  the  evidence  experiments  are  considered  independent. 
Unless  the  design  and  scheduling  of  experiments  is  done  very  carefully  the  independence  assump¬ 
tion  may  be  far  from  valid.  Also,  although  evidence  can  negatively  reinforce,  reinforcement  may 

be  unjustifiably  extreme  (see  Berger,  1985). 

Both  of  the  above  formulas  are  ad  hoc  and  require  care  when  choosing  appropriate  and  consistent 
weights.  The  linear  pool  formula  does  not  allow  for  negative  reinforcement  and  the  independent 
pool  formula  can  be  unstable  with  increasing  evidence.  Our  additive  update  rule  is  rigorously  and 
consistently  derived  from  basic  probabilistic  axioms  and  models.  Also,  as  shown  in  Section  3.1, 
our  additive  update  rule  allows  for  negative  reinforcement  because  the  weights  are  not  required  to 
sum  to  1  for  the  result  to  be  consistent  as  a  probability.  The  negative  reinforcement  is  also  shown 
in  the  numerical  results  presented  in  Section  4.  Finally,  as  discussed  previously,  our  additive 
update  role  changes  conservatively  with  respect  to  accumulating  evidence  and  thus  reinforcement 
is  stable,  especially  for  missing  or  bad  evidence  outliers. 

5.2  Jeffrey's  Rule 

Jeffrey  (1965)  presented  a  rule  that  is  an  alternative  to  the  usual  multiplicative  update  rule  based 
on  Bayes  rule  for  revising  a  probability  P  to  a  new  probability  P*  based  on  new  probabilities 
P*(Ei)  on  a  partition  {Ej}nj=i.  Jeffrey’s  rule  is  written 

»  *  (5.3) 

P*(//)  =  £/>  (tf|  £, •)/>*(£:,■) 

i  =  1 

and  is  judged  applicable  if  P*(HEj)  =  P(HIEj)  for  all  H  and  i.  This  condition  is  satisfied  for 
sequences  Ej  of  exchangeable  random  variables  and  in  fact  Jeffrey’s  rule  is  derivable  from  the 
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basic  formula  for  .oral  probability  and  exchanged  An  imponan.  propeny  of  Jeffmy’s  rule  rs 
that  it  is  the  natural  rule  for  revising  probability  if  given  a  prior  P,  a  partition  (E.),  and  a  new  mea- 
sum  P*  on  (Ei)  one  warns  .0  find  dre  "closes."  measure  to  P  .ha.  agrees  with  P*  on  .he  par..., on 
and  take  Uds  as  defining  P*  on  the  whole  space.  TOs  is  true  for  any  of  several  common  ways  of 
defining  cioseness  be.ween  measures  on  a  countable  sampie  space.  See  Diaconis  and  Zabei. 

(1982)  for  a  complete  discussion. 

Now  recalling  our  additive  update  rule  written  in  partition  form 

pwCjej  =  P.4) 

*  =  o  P(UEk)lk  =  0 

Jk  =  0 

this  can  be  rewritten  as 

P*  (// (  VJ  £*) )  =  X  P*  (H\E0El...EJP*  (EQEl...Ek) 

Jk  =  0  Jk  =  0 

where  we  have  replaced  P  by  P*  to  be  consistent  with  the  notation  in  Jeffrey's  rule.  Now  because 
we  are  working  with  exchangeable  random  variables  and  the  evidence  sets  induced  by  .hem,  we 
can  replace  the  conditional  P  with  P*  to  get 

F*  (H(UE*))  = 

In  d>is  form  our  additive  update  rule  is  directly  analogous  to  Jeffrey's  rule  and  can  in  fact  be  used 
derive  Jeffrey's  rule  in  the  case  that  the  (E,)  constitute  a  partition  of  the  entire  sample  space. 
This  implies  that  our  additive  update  rule  can  also  be  derived  as  the  "closes,”  measure  to  P  tha, 

agrees  with  P*  on  the  partition. 

Section  5.3  Comparison  to  NN 

Weighted  consensus  building  is  a  common  approach  to  activating  the  "neurons"  tha,  populate  arti- 


ficial  neural  systems.  Our  additive  rule  can  be  written  in 


form  that  is  similar  to  the  formula  used 
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to  represent  activity  in  artificial  neurons, 


P(H\Cj£:)  =  i>,P(£,) '  (57) 

1  i  =  1 

where  the  process  of  making  a  decision  regarding  the  value  of  the  hypothesis  is  similar  to  the  non¬ 
linear  thresholding  found  in  artificial  neural  networks. 


Here  P(Hlu,Ej)  is  equivalent  to  the  activity  in  a  “goal”  neuron  that  receives  input  from  n  “input” 
neurons,  each  of  which  has  its  own  activity,  P(E,).  The  P(Ej)  are  prior  probalities  calculated  by 
earlier  portions  of  the  net.  The  input  vector  (P(Ej), ...,  P(En))  is  filtered  through  weights  (wj, 
wn)  that  arc  learned  in  artificial  neural  systems.  From  our  additive  rule  we  have 


i  -  1 


/>(//|£t.)  ]JP(E^H) 

- ‘  (5.8) 


PiKJEi) 

l 

so  the  additive  rule  corresponds  to  a  net  in  which  evidence  sources  compete  to  activate  the  goal 
neuron.  When  the  evidence  sources  are  disjoint,  we  have  Wj  =  P(HIEj).  Equation  (5.8)  also  relates 
the  additive  rule  to  the  kind  of  linear  opinion  pooling  discussed  by  Berger  (1985).  The  main  tech¬ 
nical  difference  is  that  linear  pooling  requires  EjWj  =  1;  more  important  is  the  fact  that  the  Wj  in 
linear  pooling  are  ad  hoc. 


Section  6  Future  Directions 

In  this  section  we  describe  several  aspects  of  the  additive  update  process  that  need  more  research 
but  show  promising  directions  for  relating  it  to  several  other  areas  of  current  research  in  probabil¬ 
ity,  measure  theory  and  dynamical  systems. 


Section  6.1  Simulation 

The  Representation  Theorem  for  sequences  of  exchangeable  random  variables  that  was  discussed 
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in  Section  2.3  allows  us  to  simulate  the  long-term  behavior  of  such  sequences  and  thus  understand 
the  long-term  properties  of  evidence  sequences  and  the  additive  update  process.  Recalling  the  the¬ 
orem 

P(EieAi)1_l  =  11^04;)  41(f)  (6.1) 

n  * 

where  Ej  is  the  sequence  of  exchangeable  random  variables,  is  the  space  of  all  probabiliity 
measures,  and  }1(P)  is  a  measure  on  the  space  of  probability  measures.  The  theorem  states  that  the 
probability  of  a  set  defined  by  the  conditions  XjE  Aj  is  given  by  a  mixture  of  power  probabilities 
weighted  by  a  measure  on  the  space  of  probability  measures.  One  can  also  view  JI(P)  as  a  prior  on 
the  space  of  proability  measures  that  gets  updated  to  a  posterior  probabiliity  P  based  on  power 
probabilities  on  the  sets  defined  by  the  conditions  ^  E  Aj.  Thus,  one  can  simulate  and  study  the 
statistical  properties  of  the  probabilities  P  on  sets  of  exchangeable  random  variables  by  sampling 
from  Q  using  |1(P).  Dubins  (1967)  discusses  methods  for  sampling  random  distribuuon  functions 
using  a  natural  measure  on  the  space  of  probability  measures.  In  the  future  we  will  use  this 
approach  to  study  the  long-term  and  aggregate  properties  of  sequences  of  exchangeable  random 
variables  that  represent  evidence  updates. 


Section  6.2  Kahane  Martingale 

Our  additive  update  formula  can  be  viewed  as  producing  a  random  variable  which  is  an  expecta¬ 
tion  of  sums  of  random  variables  (i.e.  indicator  functions)  over  a  <7- algebra  generated  by  a  parti¬ 
tion  formed  from  the  evidence  and  hypothesis  support  sets.  In  Section  3.1  we  derived  that 


This  allows  us  to  write  our  additive  update  formula  in  a  multiplicative  form 

fl  0  EtH+  XEkH ) 

E  _ 

TKh  +  h? 

K  k=  1 

The  expected  value  of  the  n-th  term  in  the  product  is  1,  because  before  we  perform  the  experiment 
we  consider  the  evidence  sets  to  be  entirely  contained  within  the  hypothesis  set  H.  Thus  we  may 
consider  the  iterated  product  of  indicator  function  ratios  as  a  special  type  of  martingale  discussed 
by  Kahane  (1987).  These  martingales  are  the  basic  model  for  a  variety  of  multiplicative  random 
process  applications  such  as  random  coverings,  certain  branching  processes,  and  the  cascade  pro¬ 
cesses  of  Mandelbrot  used  for  modeling  turbulence.  Kahane  studies  the  limit  distribution  of  the 
random  products  and  describes  their  support  sets  as  well  as  the  analytic  properties  of  the  random 
products  viewed  as  an  operator  on  prior  measures  P(H).  We  will  apply  these  results  to  our  case 
and  build  on  them  to  develop  for  our  case  a  better  understanding  of  the  hypothesis  testing  theory. 


Section  6.3  Logistic  Map 

If  one  assumes  that  E[P(EolH)]=  E[P(EnIH)]=2P0  then  our  additive  update  rule  becomes 


P(H\E0vEn. ) 
or 


P(H) 

P(E0^En) 


[2P0  +  2P0(l-2P0)] 


(6.5) 


pn  =  4anPo(l~Po)  (6.6) 

where  Ctn=P(H)/P(E0UEn)  and  Pn=P(HIE0uEn).  Now  we  can  see  that  our  update  rule  is  related 
to  the  well  known  quadratic  iterator  map.  Depending  on  the  choice  of  0^  this  mapping  may 
exhibit  chaotic  behavior.  We  propose  to  investigate  the  behavior  of  this  mapping  for  CCn  in  the 
approriate  range  for  our  application.  We  also  will  investigate  the  effect  of  the  mapping  if  actual 
random  variables  are  input  versus  the  expected  value  of  the  random  variables. 
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